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u 
q 

Wc propose a convex optimization formulation with the nuclear norm and €i-norm to find a large 
approximately rank-one submatrix of a given nonnegative matrix. We develop optimality conditions 
for the formulation and characterize the properties of the optimal solutions. Wc establish conditions 
under which the optimal solution of the convex formulation has a specific sparse structure. Finally, 
■ we show that, under certain hypotheses, with high probability, the approach can recover the rank-one 



submatrix even when it is corrupted with random noise and inserted as a submatrix into a much 
larger random noise matrix. 

1 Introduction 

^ . Given a nonnegative matrix A £ W riXn , > for all i = 1, . . . , m, j = 1, . . . , n, we consider the 
problem of finding X C {1, . . . ,m} and J C {l,...,n} such that A(X,,J) is close to a rank-one 
matrix, and such that ||A(X, J')\\ is large. We shall call this problem the LAROS problem (for "large 
approximately rank-one submatrix"). 

The main application of the LAROS problem is for finding features in data. For example, suppose 
A represents a corpus of documents in some language. Each column of A is in correspondence with 
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one document, and each row is in correspondence with a term used in the corpus. Here, "term" means 
a word in the language, excluding common words such as articles and prepositions. The entry of 
A is the number of occurrences of term i in document j, perhaps normalized. Such a matrix is called 
the term- document matrix of the underlying corpus. 

In this case, an approximately rank-one submatrix of A corresponds to a subset of terms and a 
subset of documents in which the selected terms occur with proportional frequencies in the selected 
documents. Such a submatrix may correspond to the intuitive notion of a topic that recurs in several 
documents, since a topic may manifest itself as a particular group of relevant terms that occur roughly 
in the same proportions. 

As another example, the matrix A may correspond to a database of pixelated grayscale images, 
where each image has the same pixel size. Each column of A corresponds to one image, and each row to 
one pixel position. The entry of A is the intensity of the ith. pixel in the jth image. In this case, 
the approximately rank-one submatrix corresponds to a visual feature that recurs in a certain position 
in some subset of the images. 

If one wanted to find more than one topic in a term-document matrix or more than one feature in an 
image database matrix, then one could iteratively find an approximately rank-one submatrix, subtract 
it from A (perhaps modifying the result of the subtraction to ensure that A remains nonnegative) , and 
then repeat the procedure p times. Let the submatrices discovered be denoted (Ii,Ji), . . ■ , (T p ,J p ). 
Suppose A(Zi,Ji) m WihJ for i = 1, . . . ,p, and let Wi, hi denote the extension of Wi, hi to vectors of 
length m,n by inserting zeros for entries not in Xj,j7i respectively. 

It is known that if A is a nonnegative matrix representing a submatrix of A, then the minimizer wh T 
of || A — wh T \\ is the dominant singular vector pair in either the Frobenius or 2-norm (a consequence of 
the Eckart-Young theorem, Theorem 2.5.3 of and furthermore, w > and h > (a consequence 
of the Perron-Frobenius theorem.) Thus, without loss of generality, we may assume that each w,ihf 
determined by the iterative computation is nonnegative. 

In this case, one has an approximate factorization 

A« [w 1 ,...,w p ][hi,...,hp] T , 

where we can write the right-hand side as WH T with W > 0, H > 0. This factorization is called 
a nonnegative matrix factorization of A. The earliest reference known to us concerning nonnegative 
matrix factorization is Thomas' solution [18] to a problem posed by A. Berman and R. Plemmons 
(which, according to a remark in the journal, was also solved by A. Ben-Israel). Cohen and Rothblum 
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[8] describe applications for NMF in probability, quantum mechanics and other fields. Lee and Seung 
[13j showed that NMF can find features in image databases, and Hofmann [12] showed that probabilistic 
latent semantic analysis, a variant of NMF, can effectively cluster documents according to their topics. 

Nonnegative matrix factorization is sometimes posed as an optimization problem: find W £ M. mxp 
and H £ M nxp , both nonnegative, such that \\A — WH T \\ is minimized in some matrix norm. It 
is known that this optimization problem is NP-hard |19| . Therefore, it is not surprising that most 
algorithms for the problem are heuristic in the sense that they do not make guarantees about the 
quality of the approximation. 

One class of heuristic NMF algorithms are the 'greedy' algorithms [2j El SI 02] that follow the 
framework described above. In a greedy algorithm, the columns of W and H are generated sequentially, 
with each new pair of columns accounting for one feature in the original A. These greedy algorithms 
give rise to the LAROS subproblem addressed in this paper, namely, find one pair Wi,hi nonnegative 
such that W{h[ , is a good approximation for a submatrix of A in the positions (i, j) where A is positive. 

The LAROS subproblem, however, is itself NP-hard as observed by [ID]. This is because the 
maximum-edge biclique problem can be naturally expressed as a rank-one submatrix problem. The 
biclique problem takes as input a bipartite graph G = (U, V, E). The output is composed of two subsets 
U* C U and V* C V such that U* x V* C E (i.e., all possible \U*\ ■ \V*\ edges between U* and V* 
are present in G) and such that \U*\ ■ \ V*\ is maximum with this property. This problem was shown by 
Peeters [15] to be NP-hard. 

Maximum-edge biclique can be expressed as finding a large rank-one submatrix using the following 
construction. Let A be a \ U\ x \ V\ matrix with rows in correspondence to U and columns in correspon- 
dence to V. Entry of A for (i, j) £ U x V is 1 if £ E, else this entry is 0. Then a biclique 
corresponds exactly of a \U*\ ■ \ V*\ submatrix of all l's. A submatrix of all l's is a rank-one matrix of 
norm (\U*\ ■ |y*|) 1/2 , and there is no other kind of rank-one submatrix of A. 

We will provide a formal definition of the LAROS problem, i.e., exactly what is the desired output 
in Section [2] (Some authors mentioned earlier, e.g., [1] and [TDJ have provided other formal definitions.) 
We will also propose an convex optimization problem in Section [2] that, for matrices A constructed 
in a certain way, successfully finds large, approximately rank-one submatrices. We present two such 
theorems. One case is when the approximately rank-one submatrix dominates the rest of the matrix; 
this is presented in Section (3) 

The second case is when A is constructed as follows: 

A = A Q + R, 
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where there are two index sets X, J such that rank(Ao(X, J)) = 1, Ao(i,j) = for all ^ I x J, 
and R is a random matrix representing noise. In this case, under certain assumptions, our algorithm 
recovers (I, J) from A as proved in Section [H 

2 Matrix norm minimization 

For reasons that will become clear, we start this section by presenting the convex relaxation of the 
LAROS problem, and only later will we present a nonconvex exact optimization formulation. In par- 
ticular, we propose the following convex optimization problem that in some cases solves the LAROS 
problem: 

min ||-X"||* + 6>||X||i ^ 
s.t. (A,X}>1. 

The matrix X G ]j mxn i s the unknown. Norm ||* is the nuclear norm, also called the trace norm; 
it is the sum of the singular values of X. We use the notation to mean the sum of the absolute 

values of entries of X , that is, the ^™ n -norm applied to vec(X), the concatenation of the columns of 
X into a long vector. Finally, (A, X) means the inner product of the two matrices. We note that an 
objective function involving a sum of the nuclear and £i-norms was used for a different purpose by [7]. 
Two other norms used extensively in this paper are ||-X"||, which is the spectral or 2-norm, i.e., o-\(X), 
and ||X Hoc, which is the £™ n -norm applied to vec(X), i.e., the maximum absolute entry of X. 

Before beginning a detailed analysis of this optimization problem, we first provide some motivation. 
Consider first the simplification obtained by taking 8 = 0: 

min H^ll* 
s.t. (A,X)>1. 

It follows from Proposition [T] below that the optimal solution is found using the singular value decom- 
position. In particular, if A is factored as A = [/SV T , where U € flj mxm i s orthogonal, S G I mx " 
is diagonal, and V G R nxn is orthogonal, then an optimizer is X = U(:,1)V(:,1) T /a±. Thus, when 
8 = 0, the above formulation successfully finds the best rank-one approximation to the whole matrix A. 
This approximation, however, is not always well suited for identifying submatrices. Consider e.g. 
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the following 6x6 matrix: 
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It is apparent from inspection that this matrix has two 3x3 approximately rank-one blocks in positions 
{1, 2, 3} x {1, 2, 3} and {4, 5, 6} x {4, 5, 6}. If the 'noise' entries in the upper right {1, 2, 3} x {4, 5, 6} block 
were absent, then the two dominant singular vectors would exactly identify the two diagonal blocks. 
Once the noise entries are inserted, however, the dominant left singular vector of the above matrix A 
accurate to two decimal places is [.45, .37, .37, .40, .40, .43] T . In other words, there is no separation at 
all between the rows numbered 1, 2, 3 and those numbered 4, 5, 6, so no submatrix is identified. 

Armed with a preliminary understanding of the convex relaxed formulation, we now present and 
motivate an exact (nonconvex) formulation of LAROS, which is as follows. 

mm \\x\i + e\i\\j\ 

s.t. (A,X)>1, 

Xij = 0, V(i,j) <£lx J, 

where I C {l,...,m} and J C {l,...,n} are unknowns (as well as X). For fixed X and J, the 
optimal solution would be the rank-one approximation of the submatrix A(X, J) given by the SVD, as 
explained above, and the optimal value is || A(I, + \X\ |j7"|. The first term of the optimal value 

is || A(I, H" 1 ; therefore, for appropriate selection of 9, a large submatrix (in terms of 2-norm) will be 
selected. The second term is the size of the submatrix, which is a nonconvex function. Thus, the two 
terms balance the twin objectives of selecting a submatrix with a large first singular value and selecting 
a submatrix that has a relatively small number of entries. 

This now motivates ([1]): we relax the above nonconvex formulation by replacing the cardinality 
term in the objective function with the £i-norm. The relaxed term #||JT||i in the objective function 
has the well-known effect of favoring sparser matrices X (those with fewer nonzero entries). Thus, the 
combination of the two terms seeks a low rank matrix with many entries equal to 0. For example, 
our formulation ([I]) applied to A above identifies the {4, 5, 6} x {4, 5, 6} submatrix when 6 = 0.5. In 
particular, the solution X has zeros in all positions except {4,5,6} x {4,5,6}; in these positions it has 
positive entries ranging from 0.08 to 0.16. 
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To start our more formal analysis, let us consider the following general norm minimization problem. 

min W-X'lll 

(2) 

s.t. (A,X)>1, 

where ||| • ||| is an arbitrary norm function on W nxn . (For example, the objective function \\X \\*-\-9\\X\\i 
appearing in (pQ) is a norm.) Using the associated dual norm ||| • |||*, we can relate Problem ([2]) to an 
equivalent problem as follows. 

Lemma 1. Consider A / 0. Matrix X* is an optimal solution of Problem (0) if and only if Y* = 
HI A|||*X* is an optimal solution of the following problem: 

max (A, Y) 

(3) 

s.t. |||Y||| < 1. 

Proof. Let X* be an optimal solution of Problem Clearly, X* / and (A, X*) = 1. Apply the 
norm inequality, we have 

1 



| JSC* HI • |||A|||* > (A,X*) = 1 ^ \\\X*\\\ > 



JA|||* 

According to Boyd and Vandenberghe [5], the dual of the dual norm is the original norm and the 
norm inequality is tight: for any A, there is always anl/O such that the equality holds (for finite- 
dimensional vector spaces). Since X* is an optimal solution of Problem (|2j), 

111X11 



|||A|||* 

Let Y* = \\\A\\\*X*, we then have: |||F*||| = 1 and (A,Y*) = \\\A\\\*. We also have: 

|||A|||* = max (A, Y) 

s.t. |||r||| < 1. 

Thus Y* is indeed an optimal solution of Problem ©. Using similar arguments, we can prove that 
conversely, if Y* is an optimal solution of Problem ([3]), then X* = (||| Alll*)^ 1 !^* is an optimal solution 
of Problem ©. □ 
The next lemma characterizes the set of all optimal solutions of Problem ([3]) . 

Lemma 2. The set of all optimal solutions of Problem Q with A ^ is the subgradient of the dual 
norm function ||| • |||* at A, 9|||A|||*. 
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Proof. Let Y* be an optimal solution of Problem (J3j), we have |||Y*||| = 1 since A / 0. Thus we 
have: |||A|||* = (A,Y*). For an arbitrary matrix B G M mxn , 

+ > (A + B,Y*) = \\\A\\\* + (B,Y*). 

Thus Y* G 0|||A|||*. 

Now consider Y G <9|||A|||*: 

|||-B|||* > HI A|||* + (B - A, Y) (A, Y) - |||A|||* > (B, Y) - \\\B\\\*, VBe R mxn . 

With B = and B = 2A, we obtain the equality (A,Y) = \\\A\\\* > 0. We have: 

(a,y) = \\\a\\\* < |||A||ri||y||| o (infill - i)|||A|||* > o mriii, > 1. 

In addition, (B, Y) - |||B|||* < for all B G M mxn . The norm inequality (B, Y) < \\\B\\\*\\\ Y\\\ is 
tight and Y ^ (\\\Y\\\ > 1); therefore, there exists B / such that (B, Y) = |||B|||*|||Y|||. Thus we 
have: 

|||s|||*|||y||| - iiism* < o (|||y||| - i)|||s|ir < o =>- |||y||| < 1. 

Thus |||V||| = 1 and (A,Y) = |||A|||*, the optimal value of Problem ©, which means Y is an optimal 
solution of Problem ([3]). □ 
Lemma Q] and [2] show that the set of all optimal solutions of Problem fl2J) is (||| AIH*)" 1 ^! A|||*. The 
uniqueness of the optimal solution of Problem ([2]) is equivalent to the differentiability of the dual norm 
function ||| • |||* at A. These results are summarized in the following theorem. 

Theorem 1. Consider A^O. The following statements are true: 

(i) The set of optimal solutions of Problem (0) is (||| A|||*) _1 <9||| A|||*. 

(ii) Problem @ has a unique optimal solution if and only if the dual norm function \\\ ■ \\\* is differ- 
entiate at A. 

If the norm is set to be the nuclear norm, we obtain the following minimization problem, which has 
been used [9j fT6j [61 Q] as a relaxation of rank minimization optimization problems: 

min H-XIL 

(4 ) 

s.t. (A,X)>1. 
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The dual norm of the nuclear norm is the spectral norm. According to Zi§tak [20], if A = UYlV T 
is a singular value decomposition of A and s is the multiplicity of the largest singular value of A, the 
subgradient d \\ A\\ is written as follows: 



d\\A\ 





S 












V 1 :S€ 51,11511 =1 



Clearly, the largest rank-one approximation of A, UivJ , always belongs to the subgradient d ||A||. The 
description of the subgradient d \\ A\\ shows that the maximum possible rank of an optimal solution 
of Problem @ is the multiplicity of the largest singular value of A. In addition, the spectral norm 
function || • || is not differentiable in general. The uniqueness of the optimal solution of Problem is 
equivalent to the differentiability of the spectral norm function || ■ || at A. The necessary and sufficient 
condition is s = 1 or equivalently, ci(A) > 02(A). In the case of unique optimal solution, we obtain 
the largest rank-one approximation of A (up to the scaling factor 1 1 ^4. 1 1 1 ) . These results are stated in 
the following proposition: 

Proposition 1. Consider A 7^ 0. The following statements are true: 

(i) The set of optimal solutions of Problem Q is || A\\~ l d \\ A\\ . 

(ii) The largest rank-one approximation of A is an optimal solution of Problem Q and it is the unique 
solution if and only if a\(A) > 02( A). 

Similar to low-rank minimization problems with nuclear norm approximation, sparse optimization 
problems can be approximately handled by the (vector) ^i-norm function || • 1^. Let us consider the 
following problem 

min \\X\\, 

(5) 

s.t. (A,X)>1. 

The dual norm of ^i-norm is the (vector) infinity norm || • || , i.e., the maximum absolute entry of the 
matrix, and the subgradient d || All can be written as follows, 



dHAH^ = conv<^ sgn(aij)Eij \ (i,j) G argmax|a fc i| 

where Eij is the unit matrix in W nxn with Eij(i,j) = 1. The sparsity of the optimal solution X* 
of Problem ([5]) is clearly related to the multiplicity of the maximum absolute value of elements of A. 
Applying Theorem [1] for this particular ^i-norm, we obtain the following results: 
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Proposition 2. Consider A / 0. The following statements are true: 

(i) The set of optimal solutions of Problem © is || A||^ d ] | ] 1 0O . 

(ii) The matrix sgn(a,ij)Eij , where (i,j) E argmax|afc/|, and sgn(-) is the usual sign function, is an 

(k,l) 

optimal solution of Problem © and it is the unique solution if and only if \aij\ > ja^/l for all 
(k,l)^(i,j). 

As mentioned above, finding a low-rank submatrix clearly involves both low-rank and sparse opti- 
mization (with a specific sparse structure). Let us return to the parametric optimization problem (pQ) 
proposed at the beginning of this section 

min || -X" + 9 ||_X" || a 
s.t. (A,X)>1, 

where 9 > 0. Clearly, if 9 = 0, we obtain Problem §S§ and when 9 — > oo, we approach Problem ((5]). 
This optimization problem clearly addresses both low-rank and sparse requirements of the solution X. 
We now would like to characterize the set of optimal solutions of the problem. 

The objective function ||-X"|| t + ||-X"|| j is a norm function since 9 > 0. Denote ||X||g to be this 
parametric norm of X , 

\\X\\ e := + 0HXII! 

and consider its dual norm function || • Clearly, Problem ([TJ is a special case of Problem ([2]). The 
set of optimal solutions of Problem (JTJ) can therefore be characterized as follows: 

Proposition 3. Consider A ^ 0. The following statements are true: 

(i) The set of optimal solutions of Problem ([!]) is (|| Allg)" 1 ^ || A||g. 

(ii) There is a unique solution if and only if the dual norm function \\-\\*q is differ entiable at A. 
We now focus on deriving some properties of the dual norm || • \\g. We have: 

|| A||a = max ( A, X) 

" U V 1 (6) 

s.t. ||X|| <1. 

We will use the gauge function and its dual polar function (see Rockafellar [T7] for more details) to 
compute this dual norm. 
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Proposition 4. The dual norm \\A\\g with 6 > is the optimal value of the following optimization 
problem: 

\\A\\* e = mm max{||Y|| .0" 1 H^IL} 
s.t. Y + Z = A. 

Proof. Consider the closed unit ball C* = {X G ]g mxn | ||_XT ||^ < 1} with respect to the nuclear norm 
and similarly, the unit ball C\ with respect to the £i-norm || • 1^. We have the polar of C* is the closed 
unit ball with respect to the spectral norm, C° = C. Similarly, we have: C° = Coo, the unit ball with 
respect to the infinity norm. 

Using the definition of gauge functions, we have: \\X\l* = 7c»(-X") = min{A > | X G AC*}. In 
addition, the support function as(X) = m&x{{X ,Y) \ Y G S} is the gauge function of 5° for all 
symmetric closed bounded convex set with G int(5). All unit balls satisfy these conditions; therefore, 
we obtain the well-known results \\X\\^ = ^(X) = ac°(X) = ac(X) and {{X^ = 7c 1 (X) = gq° (X) = 
° Caa {X). 

Now consider the unit ball Co = {X G R mxn \ \\X\\^ + 9 ((X^ < 1}, we have: 

C-e = {X G R mxn | a c (X) + 9<j Coo (X) < 1}. 

Applying the definition of support functions, we have: crc(X) + Oac^iX) = ac+eCooiX), where 
C + 9C oo is the Minkowski sum of two sets, C and OCqq. This set satisfies all the conditions above; 
therefore, ac+ec^X) = Kc+ec^y ( x )- Thu s 

C e = {Xe R mxn | i {c+ ecM x ) < l ) = ( c + dC °o)°- 
We also have: \\A\\l = ac g (A). Thus 

ll A lle = lc° e {A) = j c+eCoo {A). 
We have: ^c+eCodA) = min{A > | A G A(C + OCoo)} or equivalently, 

Ic+ec^ (A) = min A 

s.t. A = Y + Z, 
\\Y\\ < A, 
0- 1 Halloo < A, 
A > 0. 
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Rewriting the minimization problem above, we obtain the final result as shown in ([7)): 

||A||g = min max { \\Y \\ , 9" 1 H^H^} 
s.t. Y + Z = A. 

□ 

We can now derive the optimality conditions for both problems © and (J7J): 

Lemma 3. Nonzero feasible solutions X and (Y, Z) are optimal for Problem @ and Q respectively 
if and only if they satisfy the conditions below: 

(i) HYIH^IIZII^ 

(ii) X e ad||YX a > 0, 

(Hi) X £ (3d WZW^, (3>0, and 
(iv) a + 9(3 = 1. 

Proof. We first prove the weak duality result. Consider feasible solutions X and (Y, Z) for Problem 
^ and d?D respectively, we have: 

(X,A) = (X,Y) + (X,Z) 

< \\X\\JY\\ + 11X^1^ 

< \\x\im^{\\Y\\ j- 1 wzw^ + ewxw^BxiWYW ,e- 1 \\z\u 
= (Wxn + ewxwjmaxQYie-iwzw^} 

The strong duality result shows that X and (Y, Z) are the optimal solutions if and only if (X, A) = 
max{||l^|| , H^Hoo}- This happens if and only if all the conditions below are satisfied: 

(i) (X,Y) = \\X\l \\Y\\ and (X,Z) = \\X\l, \\Z\\^ 

(ii) = max{\Y\\,d- x {{Z^} = 0~ l \\Z\\^ and 

(iii) \\x\i + e \\X\\ x = l. 

The first two conditions are equivalent to the fact that X = at? ||Y"||, where a = ||-X"|| + , and 
X = (3d H^ll , where (3 = H-X'Hj. The second condition is simply ||Y"|| = _1 ||Z|| OO and the third 
condition is equivalent to a + 9(3 = 1. Thus we have proved the necessary and sufficient optimality 
conditions for Problem © and ([7]). □ 
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Using these optimality conditions, we can obtain simple sufficient conditions for the uniqueness of 
the optimal solution X: 

Proposition 5. Consider the feasible solution X of Problem @. If there exists (Y,Z) that satisfies 
the conditions below, 

(i) Y + Z = A and \\Y\\ = 8" 1 HZ^, 

(ii) X e a9||V||, a > 0, 
(Hi) X £ pdwzw^, f3>0, 

(iv) a + 9/3 = 1, and 

(v) || • || is differentiable at Y or \\ ■ \\ is differ entiable at Z, 
then X is the unique optimal solution of Problem © . 

Proof. Using the first four conditions, we can prove that X is an optimal solution of Problem ([6]) and 
(Y,Z) is an optimal solution of Problem ([7]). Now assume that || • || is differentiable at Y, we have: 
d \\Y\\ is a singleton, d \\Y\\ = {V}. Thus we have: 

pqi! = a [| Vl| x = =» a{\ + 8 HVld) = =» a = * . 

1 + || V \\i 

Assume there is another optimal solution X ^ X of Problem ([6]) . Applying Lemma El we will have 
X G ad \\Y\\ and similarly X £ f3d\\Z\\ with a + 8/3 = 1. Same calculation results in a = a 
(contradiction). Thus X is the unique optimal solution of Problem ©. Similar arguments can be used 
to prove the uniqueness of X if || • || is differentiable at Z. □ 
Proposition [5] relies on dual solutions Y and Z to show the uniqueness of the primal solution X. 
Next, we will focus on the low-rank and sparse property of the optimal solution X for different values 
of 8. The following theorem provides the sufficient conditions on matrix A for the rank-one property 
(and uniqueness) of the optimal solution X when 8 is small enough. 

Theorem 2. If A satisfies the condition o~\{A) > ^(A), then Problem (0) has a (unique) rank-one 
optimal solution X for all < 8 < 8 a, where 8a ~ 



/ran \3o~i(A) — (72(A) 

Proof. The optimality conditions in Lemma [3] show that there exist Y and Z such that A = Y + Z , 
||Z|| = 8 || V|| and X £ ae?||y||. Applying a standard perturbation theorem of singular values (see 
Cor. 8.6.2 of HI]), we have: 

|<7<(A)-(7i(V)|<l|Z||, * = 1,2. 
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We also have: \\Z\\ < y/mn \\Z\\ . Thus 



\Z\\ < y/mn(6\\Y\\) = s/mn (Oa^Y)) . 



For all < 9 < 9a, we have: 



(Ti{Y) < ai(A) + \\Z\\ < oi{A) + yftnH{9<r l {Y)) < a^A) + y/rrvh {0 au^Y)) 



This implies 

(1 - A\/mn)cri(Y) < a\(A) <^ 
We then have: 



„ ?^ {A) (A M Y ) < °^ A ) «* KIT) < ^(A) - (75(A)). 
3<ti(A) - cr 2 (A) 2 



Thus 



izn < v^(^ion) < v^^Acnon) < 2^( A ) - ^(A)). 



(Ti (V) > <7i(A) - \\Z\\ > ±(ai(A) + a 2 (A)) > a 2 (A) + ||Z|| > a 2 (Y). 



We have o"i(V) > 02 C^); therefore, || • || is differentiable at Y. According to Proposition [5j we have 
X is the unique rank-one optimal solution of Problem ([6]) . □ 

The last result of this section concerns the nonnegativity of X. If A is nonnegative, then one might 
expect X to be nonnegative. For 9 = or 9 = oo, this is certainly true by preceding results in this 
section. It is not always necessarily true for intermediate values of 9. The following theorem shows 
that, at least in the rank-one case, nonnegativity is assured. 

Theorem 3. Consider the set of optimal solutions of Problem (0) when A > 0. We have: 

(i) If Problem ([2| has a rank-one optimal solution, then there exists a nonnegative rank-one optimal 
solution. 

(ii) If 9 > 1, then all optimal solutions of Problem Q are nonnegative. 
Proof. 

(i) Consider a rank-one optimal solution X, X = auv T , of Problem (jGj) . We prove that |X| = 

T ~ i i 

a \u\ \v\ > is also an optimal solution. Let X denote \X\. We have: 



X = X +9 X = \\X\l + 0IIXH! 



In addition, (A, X) > (A, X) since A > 0. Thus clearly X is also an optimal solution. 
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(ii) Assume that there exists an optimal solution X of Problem ([6]) is not nonnegative. Without loss 
of generality, assume x\\ < 0. Consider X(e) = X + eEn, where e > and En is the matrix of 
all zeros except the element En (1,1) = 1, we have: 

\\X(e)\l<\\X\l + e\\E u \l = \\X\l + e. 

In addition, || JS£T(e) || j = ||-X"||i — e if e < |xn|. Therefore, we have: 

\\X(e)\\ e < \\X\\ + (l-9)e = l + (l-9)e<l, V0 < e < |a?u| . 

Here we assume that therefore, ||JSC|L = 1. We also have 

(A,X(e)) = (A,X) + ea n > (A,X). 

Now consider X = - — — — -—X(e). Clearly, ||-X"|| = 1 and (A, X) > (A, X) > (contradic- 
tion). Thus all optimal solutions of Problem © are nonnegative if 9 > 1. 

□ 



3 Sparsity 

As mentioned in the introduction, the penalty term in the objective function of (pQ) is intended 

to promote sparsity of X. For some very simple convex optimization problems with an l\ penalty term, 
e.g., the unconstrained problem of minimizing ||a? — c|| 2 + ^H^Hi for a given vector c, it is known that 
sparsity increases monotonically with 6 (i.e., if x\ is the optimizer for 0\ and x\ is the optimizer for 62 
with 9\ < 62, then the indices of nonzeros of x\ are a subset of the indices of nonzeros of x J). 

For a more complicated problem such as (HJ, monotonicity does not hold in general. But nonetheless, 
some weaker statements about the relationship between 9 and sparsity are possible. Two such results 
are derived in this section. We start with a lemma that leads to a sparsity result. 

Lemma 4. Assume X = auv T , where \\u\\ 2 = \\v\\ 2 = 1, u > 0, and v > 0, is the optimal solution of 
Problem (0) . If U{ > Uj = then 

AW* - CL i V 

where and a,j are the ith and jth row of A. 
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Proof. We again assume here which means ||-X"||g = 1. We have: 

1 



l-^lle = cr + #c|Mli || w Hi = 1 -O- a 



1 + 9 \\uWi_ 1 1 17 1 1 

Consider X(e) = a(u + eei)v T , where e > and a is the ith unit vector, we have: 

\\u + eei\\l = \\u\\l + (m + e) 2 - u 2 = 1 + 2me + e 2 . 
Thus H^e)!^ = ay/l + 2uie + e 2 . On the other hand, H-X^e)^ = o-(||ia||i + e) \\v\\ v So we have: 
||X(e)|| e = 1 + a (v/1 + 2^6 + 62 + + w ( Vl + 2^+^ + 1 + ° Wl ) " 

Let a = 4 ,, h Hvll-i and consider X = , we have: ||X|L = 1 and 

VI + 2 Ui e + e 2 + l 1 l + aea 11 lle 

(Ai) = (A ' x(e)) = l|A|l ^ + (7ga ^ < ||A||;. 

s ' 1 + crea 1 + crea ~ 

With a > and e > 0, we obtain the following inequality 

ajv < ( - 2u * + e — + e \\v\i,) \\a\\; . 



Taking the limit e — )• + , we have: 



T 

A > 



.Mil + u i 

Now consider the case in which m > and set X(e) = a{u — eei)v T , where < e < U{. Similarly, 
we have: 



2ui — e 



i*m.- 1 -"l Vl _ a ^ +e , +1 +'M. 



This implies the following inequality 



ajv > ( - 2Ul % + (? Mi ) II^IIS • 

l .v / l-2n i e + e 2 + l 1 



Again, taking the limit e — > + , we have: 

IIA||* < 



T 

a; v 



\v\\x + u i 



From these two results, we can see that if Ui > uj = 0, then 

ajv > ajv 



6 || f || ! + 
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□ 



Since the roles of columns and rows are interchangeable, we also have the following result. If 
Vk > vi = 0, then 

u T A k u 1 Ai 

11 lle = e\\u\\ l + v k ~ Jpv (9) 

where A k and A\ are fc-th and Z-th column of A. 

The sparsity structure of X = auv T depends on the sparsity structure of u and v. The results 
obtained above help us derive some conditions under which a row (or column) of X is zero. 

Corollary 1. Consider two rows af and aj of matrix A. If m.in k {ai k } > amax k {aj k }, where a> 1, 

then for every 9 > and for every nonnegative rank-one optimal solution X of Problem @, the 

a — 1 

jth row of X is zero. 

Proof. Assume that X = auv T , where u > and v > 0, and ||u|| 2 = IMI2 = 1- We have: 
ajv mm k {a ik } {{v^ amax k {a jk } \\v^i _ Qmax fc {a jfc } amax k {a jk } 

since \\v\\ 1 > \\v\\ 2 = 1. On the other hand, we also have the following inequality 

a j v maxfejaj-fc} H^l^ _ max fc {aj fc } 



\v\\i+ u j 



We have: 



amax k {a jk } m&x k {a jk } (a -1)9-1 

- max{ a k j\ — — — — — > 0, Vw > 



6» + l 6 k L J1 6(9 + 1) ' a 

Thus we have: 

aTv >„,^ V . v*> 1 



^H^l^ + Uj ^ll^lli + ^i' a — l' 

which means Uj = according to Lemma [H Thus the j-th row of X is zero. □ 
We would like to use these results to build up results for columns and rows simultaneously. More 
exactly, consider a subset X C {1, . . . ,m} and J C {1, . . . ,n}, we would like to obtain conditions on 
magnitudes of elements of A(I, J) as compared to those of the remaining elements of A to guarantee 
that all rows and columns that are not in X and J have to be zero in the nonnegative rank-one optimal 
matrix X of Problem ([6]) for 9 > 9q. One of the difficulties here is that under these conditions, there is 
a coupling relationship between rows and columns. More exactly, in order to prove the rows that are 
not in I are zero, we need to prove the columns that are not in J are small or zero at the same time. 

Lemma H] and Corollary Q] are based on local optimality conditions with respect to rows or columns. 
We can obtain additional results on the sparsity of the optimal solution X using the global optimality 
conditions. 



16 



The following theorem states that if the weight of nonnegative matrix A is concentrated in a partic- 
ular subblock then for 9 sufficiently large, the optimal solution X will have nonzero entries only in that 
subblock. "Concentration of weight" in this sense means that the average of those entries dominates all 
the other entries of the matrix. 

As a special case, this theorem implies that if the maximum entry of A is unique, then for 9 
sufficiently large, X will be a singleton matrix whose unique nonzero entry corresponds to the maximum 
entry of A. 

Theorem 4. Assume A > 0. Let I and J be subsets of {l,...,m} and {l,...,n}, respectively; 
\Z\ = M and \ J\ = N . Define a(Z,J) = aij and a max (X, J) = max a^. Ifa{Z,J) > 

a>max(Z,>J) then all optimal solutions X of Problem @ are sparse, Xy = for all (i,j) (T,J), for 
all »>e B , .Here 9 B - ^ ( ^,J)^™+<>^)\ . 

Proof. Assume there exists an optimal solution X of Problem © such that Xj,- 7^ for some 
(i,j) when 9 > 9b- We have: 9b > 1; therefore, according to Theorem El X > 0. Thus 

Xij > 0. We also have: therefore and ||-X"||g = 1. Consider two cases, aij = and aij > 0. 

If aij = 0, let Xq = X — XijEij, where Eij is the matrix of all zeros but the element Eij(i,j) = 1. 
We have: ||Xo|| < ll^ll + x ij an d ||-X"o||i — ll^lli ~ x ij- Thus 

\\X \\g < \\X\\ g + (1 - 9) Xij < 1, \/9>9 B > 1. 

We also have: since = 0, (A, Xq) = (A,X) = \\A\\g > 0. Thus Xo / or ||-X"o|le > 0- Define 

Xq = Xo, we have: Xq is a feasible solution of Problem © with the objective value (A, Xq) = 

\\ x o\\e 

11X11* 

> \\A\\g (contradiction). 



Now consider the case when a^ > 0. Define D = exe'j — M Nre^eJ , where r = a ^ ""^ , ej = ej, 

ei G M m is the ith unit vector in W 71 , and similarly, ej = e^, € ]R n is the j-th unit vector in M. n . 

We have r is well-defined since A > and r > 1. We now consider a new solution X a = X + a.D, 
where < a < j^jj ■ Clearly, X a > 0. Thus we have: H-X^l^ = H-X"^ + aMN(l — r). Applying the 
triangle inequality, we can bound ||X a || as follows: 



Thus we have: 



ll^alL < M,+ <x\\ D \\* < \\X\\, + a(VMN + MNr). 

\X a \\ < \\X\\ e + aVMN \(l + rVMN) + 0V / MiV(l - 1 
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Since > 6 B = —7== — — 7==\ and < < a max (2, J) < a[X, J), we have: 

VMN \ a{T, J) - a max {l, J) J 

^ J_fl+rVMN_\ 
> VMN \ r-1 J 

This implies that ||X„|L < II X II a = 1 for all < a < — — — . Now consider the scaled solution 
^ ii "lie n ne - MNr 

X s a = X tt , which is also a feasible solution of Problem ([6j). In terms of the objective, we have: 

(A, D) = e^Aej - MNra i:j = 0. Thus (A,X a ) = (A,X) = \\A\\* g for all a. We then have: 

which is a contradiction because ||A||g is the optimal value of Problem ([6]). 

Thus we can conclude that if A > 0, all optimal solutions X of Problem ([6]) are sparse with Xij = 
for all (1, J) when 6 > 6 B . □ 



4 Random noise 

The main technical result of this article is that the proposed algorithm can find a large rank-one 
submatrix hidden in a substantial amount of noise. The noise takes two forms: the rank-one submatrix 
itself has random noise added to it (so that its rank is no longer 1), and the entries outside the rank-one 
submatrix are generated by a random process. 

First, we recall the following definition: a random variable x is b-subgaussian if its mean is zero, and 
if there exists a b > such that for all t > 0, 

P(M >t)< exp(-t 2 /(2b 2 )). (10) 

For example, a normally distributed variable or any mean-zero variable with a discrete distribution is 
subgaussian. 

The result of this section is the following bound. For this entire section, we adopt the notation that 
cm denotes the vector of all l's of length M, and similarly for e^v- 

Theorem 5. Let Abe an m x n matrix defined as follows. 

A = ( ^ °) + ( Rl1 Rl2 ), (11) 
\ J \R 21 R 22 ) 
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where o~o > 0, uq 6 iio > 0, M < m, and vq £ R^, «q > 0, N < n. Furthermore, assume that 
Uo = ej\i+p with ||p|| 2 < c\>J~M, andvo = e^ + q with ||q|| 2 < C2\^N. The matrix R is a random matrix 
with i.i.d. nonnegative elements r\j with mean c^o-q, where 03 > is a constant, such that r^joQ — 03 is 
b-subgaussian. Here 01,02,03 are positive constants. Assume that these scalar constants 01,02,03 satisfy 
the following relations 

eg < 1/3, c 3 + c 5 <l. (12) 

where C5 is chosen to satisfy 

C5 > ci + c 2 + cic 2 . (13) 
Under these hypotheses concerning A, and assuming 8 satisfies 

< mm , • , (14) 

A > 7^=, (15) 

1 - c 3 - c 5 VMiV 

i/ie solution X to problem §1§ is a rank-one matrix with positive entries in positions that are indexed 
by {1, . . . ,M} x {1, . . . ,N} and zeros elsewhere with probability exponentially close to 1 (i.e., of the 
form 1 - exp(-(M + N) const )) provided that MN > ((M + iV) 4 / 3 ) and MN > (l(m + n). Here, 
the constants implicit in the fi(-) notation depend on b and C5. See (I3i?p - (|35p below for a detailed 
presentation of these constants. 

Remarks. 

1. Naturally, the theorem also applies if the MN distinguished entries occur as any M xN submatrix 
of A; we have numbered the distinguished submatrix first in order to simplify notation. 

2. It is not enough to assume simply that uq > and Vq > because if these vectors have very 
small entries, then they cannot be distinguished from the noise. 

3. This theorem is not a consequence of Theorem |4] because the hypotheses do not force entries 
outside the distinguished block to be smaller than the average of the distinguished block's entries. 

4. The relationships among the constants as well as ()14|) . f)15|) can all be satisfied provided 03,05 are 
sufficiently small. 

5. The result holds with probability exponentially close to 1 as long as M ~ N and M > f^m 1 / 2 ), 
N > n(n 1/2 ). Thus, the rank -one submatrix can be much smaller than the entire matrix A. 
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Before beginning the proof of the theorem, we require the following key lemma regarding matrices 
constructed from independent 6-subgaussian random variables. 

Lemma 5. Let B £ ]g»™xn ^ g ffl ran d om matrix, where bij are independent b-subgaussian random 
variables for all i = 1, . . . , m, and j = 1, . . . , n. Then for any u > 0, 

(i) ¥{\\B\\ >u)<exp (- (J^L - (log 7)(m + n) 

(ii) P(||CB|| > u) < exp ^— ||C|| 2 ~~ + n) j j , w here C is a deterministic matrix. 

The proof of this lemma follows the proof techniques by Litvak et al. [H]. Major steps are shown 
as follows. 
Proof. 

(i) We have: = max y 1 Bx. We discretize the unit balls in W 1 and M. m by finite e-nets, 
Il £c ll2=llyll2= 1 

where e € (0, 1). An e-net of a set /C is the subset N such that for all x G /C, there exists y £ N 
such that || x — y|| 2 < e. Using a construction proof, we can prove that there exists a finite e-net 

(i \ n 

of the unit ball in R n with the cardinality of no more than I — hi) • Let N and A4 be the 
finite e-nets of the unit balls in M n and M m with minimum cardinality, respectively. Applying the 
triangle inequality, we have: 

„ „ 1 T 

-£> < -, rrr max y Bx. 

(1 - e) 2 x€Af,yeM 

We can bound the tail probability P(||S|| > u) as follows. 

(2 \ m+n 

- + 1 ] max P (y^cc > (1 - e) 2 it) . 

e / x£Af,yeM 

m n 

We have, 6^ are independent 6-subgaussian random variables; therefore, y 1 Bx = { x jUi) hj 

i=i j=i 

n n 

is also a 6-subgaussian random variable since y, / , ( x jVi) = \\ x \\2 II2/II2 = 1- Thus we have: 

i=i j=i 

/ 2 \ m+n (i-.) 4 u 2 
P(||B|| >«)< (- + 1J e i^ 2- . 

Letting e = 1/3, we obtain the inequality 
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(ii) We have: y T CBx = [C T yf Bx, thus: 

F (y T CBx >u)<e ^li^ifiSil < e _ 5pf^F . 
Applying similar arguments, we can then obtain the inequality (ii) of the Lemma. 

□ 

We now turn to the proof of the main theorem. We now would like to find conditions on 6 and the 
constants so that Problem ([1]) has an optimal solution X of the form 




X 



where u\ > 0, 1 1 -*xi 1 1 2 = 1, and v\ > 0, ||i>i|| 2 = 1- If u\ and v\ are determined, a\ can be easily 
calculated in order to satisfy the condition (A, X) = 1 of the optimal solution. Thus the main task is to 
find u\ and v\ if they exist. We will construct them using optimality conditions derived in the previous 
section for Problem ([6|) (equivalent to Problem ([T|)) and its dual, Problem ([7]). Defining u = [u\; 0] £ M. m 
and v = [vi;0] G M n , we can then write the optimality conditions as follows: 

There exists Y and Z such that A = Y + Z and 

Y = \\A\\* e (uv T + W), Z = 0\\A\\* g V, 
where ||W|| < 1, W T u = 0, Wv = 0, and < 1, V n = e M e%. 

These conditions come from the properties of the subgradient d \\Y\\ and 9 and the fact that X 

belongs to these sets (up to appropriate scaling factors). 

In the following analysis, we will construct (V, W) so that the optimality conditions are satisfied. 
The entries of these matrices will be constructed separately for the four subblocks of A, starting with 
the (1, 1) block. Breaking the equation Y + Z = A into blocks and scaling by 1/||A||2, we obtain the 
following more detailed optimality conditions. 

Ul v1 + Wu + 9e M e T N = A xl /\\ A\\* = {a u Q vl + R u )/\\ A\\* , (16) 
Wij + OVi, = A tl /\\A\\* e = R tl /\\A\\* , (17) 

where the second line applies to equal to (1,2), (2,1) and (2,2). Following this block matrix 

notation, the remaining optimality conditions to be established are W^tti = 0, W\\U\ = 0, Wnvi = 
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0, Wi2Vi = 0, || V||oo < 1] l|W|| < 1- We shall establish the latter inequality by proving more 
specifically that ||Wy|| < 1/2 for G {1,2} x {1,2}. 

We begin with the (1,1) block of this equation. The conditions ||Wu|| < 1/2, W^ui = 0, 
WnVi = imply that ||wif'f + W\i || = 1. This is because the dominant singular triple of u\vf + Wn 
must be (l,wi,i7i) by the conditions. Equivalent to + Wn|| = 1 is 

1 



\A\ 



: An - ev n 



1. 



where, as noted earlier, we are required to take V±\ = ^m^n- 

Thus the first necessary condition for X to be the optimal solution is that there exists A > such 
that /(A) = ||AAn — 0Vii|| = 1. If such a A is identified, then U\ and V\ can be easily found since 
U\v\ is the rank-one approximation of A An — QV\\. Note that the nonnegativity of U\ and V\ will 
require additional conditions which will be discussed later. We have: 

AAn - 6»Vn = A [ctqUoVq + R u ] - Oe M eJj 



A [a {e M +p){e N + qf + flu] - 9e M e 



ii 



= (Act - 0)e M e N + A [a (e M q + pe N + pq ) + R 
= [Acr (l + c 3 ) - 0]e M eJf + A [cr (e A /q T + pe% + pq T ) + (R u - c 3 a e M e%)] . 

g 

We have: /(A) — > +oo when A — > +oo since An ^ 0. Now define Ao = — -, r to make the first 

00 (1 + c 3 ) 

term vanish, yielding 

A An - OVn = [(ro{e M q T + pej^ + pq T ) + (R n - c 3 o- e M e^)] 
= A [a P + Q] , 

where P = euq T + P^n +PQ T an d Q = -^n ~~ C3<roejv/e^. We now bound the spectral norm of P and 
Q as follows. 

||-P|| < ||eMQ r || + \\pe T N \\ + \\PQ T \\ 

= ll e ll 2 H<?ll 2 + \\PW2 W e N\\ 2 + \\Ph II9II2 
< aVMN + c 2 VMN + dc 2 VMN 

= (ci + c 2 + cic 2 )VMN. 
Matrix Q/ctq is random with i.i.d. elements that are 6-subgaussian. Thus by Lemma EJi), 
P(||Q|| > ua ) < exp (- (J^_ - (log7)(M + N) 
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for any u > 0. Let us fix u = (MiV) 3 / 8 to obtain 



IQII > MMNf/ 8 ) < exp (- ( ^ M 8 "£ /A ~ (log7)(M + N)X\ . (18) 



For the remainder of this analysis, we will impose the assumption that the event in (j!8j) does not happen. 
At the end of the proof the right-hand side Q18|) will be one of the terms in the failure probability of 
identifying the optimal X. 



Thus, ||Q || < o(1)(TqVMN, so applying the triangle inequality, 

\\a P + Q\\ < a ( Cl + c 2 + c x c 2 + o(l)) VMiV 

< a c 5 VMN. (19) 

by (|13p . (The strict inequality '>' in (113|) is used in order to absorb the o(l) term.) Therefore, 

/(A ) = Ao|koP + Q|| 
c 5 6y/MN 

~ 1 + C 3 

Thus with high probability, /(Ao) < 1 if 

Inequality (|20p is a consequence of (j!4j) stated in the theorem. This inequality implies /(Ao) < 1, and, 
due to the continuity of function /, there exists A* > Ao such that /(A*) = 1. We will prove that under 
some additional conditions, this value A* satisfies all other optimality conditions of Problem (JTJ) and 
indeed \\A\\l = — . 

Let us recall that ||A*An - 0V U || = 1, i.e., ||(A*cr o (l + c 3 )-0)e A fe^ + A*(cr o P-|-Q)|| = 1. Applying 
the fact that ||e^e^|| = V MN and the triangle inequality twice to this equation yields 

[A*cr (l + c 3 ) - 6}\fl\IN - A* \\a P + Q\\ < 1 < [A*cr (l + c 3 ) - 6}\flilN + A* ||a P + Q|| . 

Applying (fT9|) yields 

[AV (1 + c 3 - c 5 ) - 6]VMN < 1 < [AV (1 + c 3 + c 5 ) - 9]VMN. 

Rearranging this chain of inequalities and using the fact that 1 + c 3 — C5 > 0, which follows from (|12p 
stated in the theorem, yields 



1 + <A*< 1 + (21) 



<T (1 + c 3 + c 5 )VMN " <r (l + c 3 - C 5 )VMiV' 
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with high probability. 

We wish to establish that AVq — 9 > 0. Using the left inequality in ([2Tj) yields: 



,v -o > 1 + e ^ 



(1 + c 3 + c 5 )VMN 



1 ~ (ca + c 5 )gVM]V 
(1 + c 3 + c 5 )VMiV ' 

Thus, nonnegativity of AVo — 6> is implied by the inequality < l/((c3 + c§)y/MN), which is a conse- 
quence of assumption ()14|) . 
Since AVo — > 0, 

A* An - OVn = (AV - #)e A /e^ + X*(a P + flu) > 0. 

Applying the Perron- Frobenius theorem, we obtain the positivity of u\ and v\. 

We also need ||Wn|| < 1/2. Recall ||VFu|| = <7 2 (A*An — 6Vu), the second largest singular value 
of A* An — OVn, since A* An — OVn = u\vf + Wn- Using the well-known fact that 

<r 2 (A) = min{||A - S\\ : rank(S) < 1}, 

we obtain 

II Willi < ||A*(a P + Q)||. 
Here we selected S to be [AVo(l + C3) — %m%. With high probability, we obtain the bound ||Wu|| < 



X*a c 5 y/MN from (fT9j) . 

Using the upper bound on A* from (|21l) . we have: 



||w , . (1 + Qy/MN)** 
1 + c 3 - c 5 

In order to obtain ||Wn|| < 1/2, a sufficient condition is 



1 + c 3 - c 5 2 

which is rearranged as 



< 



1 + c 3 - 3c 5 \ 1 



2c 5 / V / MA 7 ' 

The latter inequality follows from (j 14H : the numerator of the right-hand side is positive by (|12|) . 
Turning to (fT7|) when = (2, 2), we need to find W22 and V22 that satisfy 

X* R22 = + OV221 
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IIW22II < 1/2, and ||V22|loo < 1- Consider the assignment V 22 = °^ C3 e m ^ M e^ N and W22 = 

u 

A*i?22 — 6V~22- The coefficient \*(Jqc 3 /9 is chosen for the definition of V22 so that the entries of the 
remainder term W22 have mean zero. 

The requirement || V^Hoo < 1 is satisfied if and only if X*aQC 3 /9 < 1. Because of the upper bound 
on A* established by (|2ip . this requirement is satisfied if 



< 1 4^ 9 > , c 5 < 1. 



#(1 + c 3 - c 5 )/MA ~\l-<%)y/MN' 
This inequality is assured by p2|) and (|15p. (In particular, ()12j) implies C5 < 1.) 

To bound HW22II) consider Vl^22/(A*a"o), which is a random matrix with i.i.d. elements that are 
6-subgaussian. Applying LemmaE^i) to V^22/(A*<ro) and taking u = l/(2A*o"o) yields 

PGIW23II > 1/2) < exp (- f — A— - (log 7)(m - M + n - iV) 

Use the upper bound on A* from (|21j) to obtain the following tail bound: 



IW22II > < exp f- f 2 ( 1 + C3 MN _ (l og 7)(m -M + n-N) 

l 2211 - 2 J - \8lb 2 (l + 9^/MN) 2 V A ' 



\816 2 (1 + 6y/MNf 

From (1221) we obtain 

(1 + C3 " C5)2 > 4c 2 , (23) 



(I + ^a/MA 7 ) 2 
hence 

I W22II > < exp (- (^f^ - (log 7)(m - M + n - AT))) . (24) 
Now consider (I17p when = (1,2). Again we need to find W12 and V12 such that 

A*i*i2 = W 12 + 9V 12 , 

II W12II 5- 1/2, ||V"i2|| co < 1, and W^ 2 w i = 0- We construct W12 and V"i2 column by column as follows: 
Vi 2 (-,i) = X * R at 1 f Ul e M, W 12 (:,i)=X*R 12 (:,i)-eV 12 (:,i) 

\\ u n\ 

all i = 1, . . . , n — N. By construction we have W\ 2 ('-i i) T u\ = for all i = 1, . . . , n — N. Now consider 
the requirement that || V12 (:, Olloo < 1 for all i = 1, . . . , n — A. The requirement is equivalent to 

A*fli2(:,i) T m < 
61 ll u i|li 

for all i = 1, . . . , n — N. Subtract X*csao/9 from both sides and apply the identity ej^u\ = ||i*i||i to 
obtain 

A*(-Ri 2 (:,i) T -c-sape^ux ^ _ A*c 3 o- 
^llwilli ~ ' 
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We will establish this inequality in two steps. First, we establish that X*c 3 ao/9 < 1/2. Because of 
2~T1) . it suffices to establish that 

C3 (i + eVMN) (25) 



This can be rearranged into 



8(1 + c 3 - c 5 )VMN 2 
2c 3 



> 



1 _ C3 _ C5 )^mn' 



which follows from (I15|) . 

Second, we establish that with probability exponentially close to 1, 

X*(R 12 (:,i) T - c?,(jQe T M )ui 1 
6\\ u i\\i ~ 2 

Notice that rji/ao — c 3 is 6-subgaussian; thus, 

— (Ri 2 {:,i) - c 3 a e M ) «i = — > u\{j) (R 12 {j,i) - c 3 a ) 

is also 6-subgaussian since ||ui|| 2 = 1- Thus, by (fTUj) . taking x = (R\2{'-,i) — c 3 aQeM) T u\/ and 
taking t = 9\\u\ ||i/(2A*cr ), 

X*(R 12 (:,i) T - c 3 a e M )ui 1 



>£) < exp(-0 2 || Wl || 2 /(8& 2 (AV o ) 2 )). 

< exp(- C 2|| Wl || 2 /(26 2 )) . (26) 



since, as noted above 9/(X*ao) > 2C3. 

To proceed, we now need a lower bound for Htiil^. Let F denote \*A±i — OVu, which is equal 
to [A*o"o(l + C3) — 9]eM^Jf + A*(cjo-P + Q). We know that u\ is the first (left) singular vector of F. 
Letting X = [AV (1 + C3) — %Afe]f and -E — X*(aoP + Q), we then have F — + E, and Xo is 
a rank-one matrix with a single nonzero singular value equal to (A*<7o(l + C3) — 9)^MN and with left 
singular vector ej^/VM and right singular vector e^/y/N- Furthermore, since || -2*" || = 1, we know that 
the singular value of Xq is at least 1 — ||25|| by Corollary 8.6.2 of [TT] . Thus, by Theorem 8.6.5 of [TT] , 

provided that ||25|| < 1/6. Thus, the next step in the analysis is to show that \\E\\ < 1/6. This follows 
from the following sequence of inequalities: 

||.E|| = A*|KP + Q|| 
c 5 (l + 0v / M/V) 



Ui 



< TTT^T < 4 /5, (27) 



1 + c 3 - c 5 
< 1/6, 
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where the second line holds with high probability according to (|19p and the third line follows from (|22p 
and (fT2j) . 

Thus, we have established that \\E\\ < 1/6 with high probability, which in turn implies that 

ll w l||l = e M u l 

= (e M - M 1/2 m + M 1/2 mfm 

> M 1/2 ujui - \{e M - M 1/2 u l ) T u 1 \ 

> M 1/2 u[ui - \\e M - M ll2 ui\\ ■ \\ui\\ 
= M 1 ' 2 - Af 1 / 2 1| M -1 / 2 cm - wi|| 

> M x l 2 — (4/5)M 1 / 2 , 

where the last line is obtained from (|27p . This gives a lower bound of M 1 / 2 /5 on ||w||i. Thus, substituting 
this into (|26|) yields 

' \*(Ri 2 (:,i) T ~ c 3 cJoe A /)Mi 1\ , 2 2 



> < exp (-a 2 c 2 M/(50& 2 )) . 



I^illi 

This shows that one column of V\2 exceeds norm 1/2 with exponentially small probability. Applying 
the union bound over all the columns, we find 

P (HVulloo >l)<(n-N) exp (-a 2 1 c|A//(50& 2 )) . (28) 

Thus, we have established that ||"Vi2 ||oo < 1 with probability exponentially close to 1. 

We now consider the matrix W12, which can be written as W12 = X*DR\2, where D £ E MxM is 
given by 

1 rp 

D = I - - — r e M «i • 

|| || ! 

Notice that we can equivalently write 

W12 = \*D(R 12 - c 3 a e AI e^ N ), 
since Dbm = 0. The matrix R12 — c^(Jo^M^n-N * s a subgaussian matrix scaled by <tq. Furthermore, 



II-D12II < 1 + ^^ < 6, 
Nil 

with high probability, since ||wi||i > Af 1 / 2 /5. Thus, Lemma[5](ii) applied to Wi 2 /(\*(Jo), taking u 
1/(2AV„), yields 

P(||»M > 1/2) < exp (- ( 36 , S J 4(X , )2 ~ dog7)(M + n - N) 
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Apply the upper bound on A* from (|21[) to obtain 

F(||Wi 2 || > 1/2) < exp (- ( 2 (! + c 3 ~ c s) 2 MjV _ + n _ N ^ 

Now finally we apply (|23|) to obtain 

P(||W 12 || > 1/2) < exp (- (|^» " ( lo § 7 )( M + " " iV ))) • (») 
The same construction and analysis applies to V 2 i and W21, and the same results are obtained 
except with the roles of (M, m) and (N, n) interchanged. Thus, 

PfllVaiHoo > 1) < (?n-M)e W (-ay 3 N/(50b 2 )) , (30) 

and 

P(||W 21 || > 1/2) < exp (- (jj^g ~ (log7)(iV + m-M))) . (31) 
From the analysis of all four blocks of V and W, we have: 

|| VIU = maxlHVnlU , ||V 12 || 00 , || , || V^l^} < 1, 

where V\\ = eMeJ. With a high probability, we also have ||W|| < 1 since 

||W|| 2 < llWiif + || W 12 || 2 + || W22II 2 + || W 2 i|| 2 < 1. 

By the union bound, the probability of failure of the main result is at most the sum of the probabilities 
of the failure at each step. Therefore, the failure of the convex relaxation to find the claimed optimal 
X is at most the sum of the right-hand sides of (HID, flM}, (TO, (EH, and §T§. We require these 
probabilities to be exponentially small. We assure that (|18|) is exponentially small by requiring 

MN > jfci(M + JV) 4/3 (32) 

where 

h > ((log7)8 1 6 2 ) 4/3 . (33) 
Next, all of (|24p . (|29p . (|3ip are exponentially small provided that 

MN > k 2 (m + n) (34) 

where 

(log 7)36 -8lb 2 

k 2 > ^ . (35) 

Finally, to ensure that (I28p and (|30f) tend to exponentially fast requires that M grow as fast as 
f2(log(n — N)) and similarly N grows as fast as f2(log(m — M)), but this is already a consequence of 
© and AM}- 
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5 Conclusions 



We have shown that a convex relaxation can find a large, approximately rank-one submatrix of a much 
larger noisy matrix provided that the dimensions of the larger matrix are no larger than the square of 
the dimensions of the smaller matrix, and provided certain upper bounds are satisfied on the level of 
the noise. 

It is interesting to note that our result also applies to the maximum biclique problem, which was 
introduced in Section [1] as a special case of LAROS. In particular, if G is a bipartite graph (U, V, E) 
containing a biclique given by U* x V*, where \U\ = m, \V\ = n, \U*\ = M, \V*\ = N, and if the 
remaining edges of E (i.e., those not in U* x V*) are inserted at random with probability 1/2, then the 
U-to-V adjacency matrix has the form (jlip in which a = 1, c\ = c% = 0, C3 = 1/2, b = 1/(8 log 2) 1 / 2 . 
(This is not quite correct since in this case Rn = 0. However, our analysis covers this case as well.) 
Thus, our algorithm with parameter 9 = 0(1/(MJV) 1 / 2 ) finds the planted biclique when M ~ N, 
m ~ n, and M > ^(m 1 / 2 ). The same result was obtained earlier by Ames and Vavasis pQ using a 
different convex relaxation. Theirs has the advantage that M, N do not need to be known or estimated 
in advance, but ours solves a more general class of problems. 
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